15 research outputs found

    Small in-distribution changes in 3D perspective and lighting fool both CNNs and Transformers

    Full text link
    Neural networks are susceptible to small transformations including 2D rotations and shifts, image crops, and even changes in object colors. This is often attributed to biases in the training dataset, and the lack of 2D shift-invariance due to not respecting the sampling theorem. In this paper, we challenge this hypothesis by training and testing on unbiased datasets, and showing that networks are brittle to both small 3D perspective changes and lighting variations which cannot be explained by dataset bias or lack of shift-invariance. To find these in-distribution errors, we introduce an evolution strategies (ES) based approach, which we call CMA-Search. Despite training with a large-scale (0.5 million images), unbiased dataset of camera and light variations, in over 71% cases CMA-Search can find camera parameters in the vicinity of a correctly classified image which lead to in-distribution misclassifications with < 3.6% change in parameters. With lighting changes, CMA-Search finds misclassifications in 33% cases with < 11.6% change in parameters. Finally, we extend this method to find misclassifications in the vicinity of ImageNet images for both ResNet and OpenAI's CLIP model

    Learning Visual Importance for Graphic Designs and Data Visualizations

    Full text link
    Knowing where people look and click on visual designs can provide clues about how the designs are perceived, and where the most important or relevant content lies. The most important content of a visual design can be used for effective summarization or to facilitate retrieval from a database. We present automated models that predict the relative importance of different elements in data visualizations and graphic designs. Our models are neural networks trained on human clicks and importance annotations on hundreds of designs. We collected a new dataset of crowdsourced importance, and analyzed the predictions of our models with respect to ground truth importance and human eye movements. We demonstrate how such predictions of importance can be used for automatic design retargeting and thumbnailing. User studies with hundreds of MTurk participants validate that, with limited post-processing, our importance-driven applications are on par with, or outperform, current state-of-the-art methods, including natural image saliency. We also provide a demonstration of how our importance predictions can be built into interactive design tools to offer immediate feedback during the design process

    Additional file 3: Table S3. of Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions

    No full text
    Checking top predictions (top 50) to establish relationship between Approach 2 (consensus amino acids and synergistic binding mode) and Approach 1 (consensus amino acid and modular binding mode) prediction for all 16 GNN targets. Our Approach 2 predictions for Finger 3 coincide with the Approach 1 predictions. (DOCX 23 kb

    Additional file 1: Table S1. of Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions

    No full text
    Detailed analysis of predictions against experimental data for Approaches 1 and 3 for all 16 GNN triplets for different finger (1, 2 or 3 positions). Experimental data involves target DNA sequence which binds to its respective helix3 on the ZFP at various positions (finger 1, finger 2 & finger 3) with the corresponding Kd values for determining experimental affinity between DNA and its respective ZFP. Approach 3 (all possible amino acids and modular binding mode) gives its top helix for the experimental DNA target with its rank and score respectively. Approach 1 (consensus amino acid and modular binding mode) gives its top helix for the experimental DNA target with its rank and IHBE score respectively. Approach 1 predicts zinc finger helices for experimental DNA targets accurately for Finger 3 followed by Finger 1, whereas Approach 3 does so for Finger 2. (DOCX 56 kb

    Effects of title wording on memory of trends in line graphs

    No full text
    International audienceGraphs and data visualizations can give us a visual sense of trends on topics ranging from poverty, the spread of diseases, the popularity of products, etc. What makes graphs useful is our ability to perceive these trends at-a-glance. Related work has investigated the effect of different properties of graphs, including axis scaling, the choice of encoding, and the presence of pictographic elements (e.g., Haroz et al. 2015) on the perception of trends or remembered size of the quantities depicted. Previous work has shown that visual attention is directed towards the text and specifically titles, which can affect what is recalled from memory (Borkin, Bylinskii, et al. 2016; Matzen et al. 2017). In a more controlled setting, we investigate how wording in a line graph's title impacts memory of the trend's slope. We designed a set of experiments that consist of first showing participants a simple graph with an increasing or decreasing trend, paired with a title that is either strongly stated ("Contraceptive use in Senegal skyrockets") or more neutral ("Contraceptive use in Senegal rises"). To avoid rehearsal, participants then performed a challenging task, before being asked to recall the title and answer a question about the graph's initial/final value or an extrapolated value. Can we change a participant's memory of a graph by modifying some accompanying text? These experiments bear resemblance to the eyewitness testimony experiments by Loftus et al. (1996). In some conditions, the strength of the wording in the title affects how participants recall the trend from memory, but this effect is not universal across experiments. Results of these experiments have important implications for how text interacts with long term visual memory and may bias future inferences.ReferencesHaroz, S., Kosara, R., & Franconeri, S. L. (2015). Isotype visualization: Working memory, performance, and engagement with pictographs. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. DOI: https://doi.org/10.1145/2702123.2702275Borkin, M. A., Bylinskii, Z., Kim, N. W., Bainbridge, C. M., Yeh, C. S., Borkin, D., ... & Oliva, A. (2016). Beyond memorability: Visualization recognition and recall. IEEE transactions on visualization and computer graphics, 22(1), 519-528

    On the Capability of Neural Networks to Generalize to Unseen Category-Pose Combinations

    No full text
    Recognizing an object’s category and pose lies at the heart of visual understanding. Recent works suggest that deep neural networks (DNNs) often fail to generalize to category-pose combinations not seen during training. However, it is unclear when and how such generalization may be possible. Does the number of combinations seen during training impact generalization? Is it better to learn category and pose in separate networks, or in a single shared network? Furthermore, what are the neural mechanisms that drive the network’s generalization? In this paper, we answer these questions by analyzing state-of-the-art DNNs trained to recognize both object category and pose (position, scale, and 3D viewpoint) with quantitative control over the number of category-pose combinations seen during training. We also investigate the emergence of two types of specialized neurons that can explain generalization to unseen combinations—neurons selective to category and invariant to pose, and vice versa. We perform experiments on MNIST extended with position or scale, the iLab dataset with vehicles at different viewpoints, and a challenging new dataset for car model recognition and viewpoint estimation that we introduce in this paper, the Biased-Cars dataset. Our results demonstrate that as the number of combinations seen during training increases, networks generalize better to unseen category-pose combinations, facilitated by an increase in the selectivity and invariance of individual neurons. We find that learning category and pose in separate networks compared to a shared one leads to an increase in such selectivity and invariance, as separate networks are not forced to preserve information about both category and pose. This enables separate networks to significantly outperform shared ones at predicting unseen category-pose combinations.This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216
    corecore